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Abstract 

> 

Hamiltonian Monte Carlo (HMC) improves the computational efficiency of the Metropo- 
^ | lis algorithm by reducing its random walk behavior. Riemannian Manifold HMC (RMHMC) 

further improves HMC's performance by exploiting the geometric properties of the parameter 
CN space. However, the geometric integrator used for RMHMC involves implicit equations that 

require costly numerical analysis (e.g., fixed-point iteration). In some cases, the computa- 
tional overhead for solving implicit equations undermines RMHMC's benefits. To avoid this 
problem, we propose an explicit geometric integrator that replaces the momentum variable in 
RMHMC by velocity. We show that the resulting transformation is equivalent to transform- 
ing Riemannian Hamilton dynamics to Lagrangian dynamics. Experimental results show that 
our method improves RMHMC's overall computational efficiency. All computer programs 



in order to allow replications of the results reported in this paper. 



and data sets are available online (http://www.ics.uci.edu/~babaks/Site/Codes.html) 

> 

in 

1 Introduction 



Hamiltonian Monte Carlo (HMC) (Duane et al. , 1987) reduces the random walk behavior of 



CN Metropolis by proposing samples that are distant from the current state, but nevertheless have 
a high probability of acceptance. These distant proposals are found by numerically simulating 
Hamiltonian dynamics for some specified amount of fictitious time (Neal, 2010). Hamiltonian dy- 

^ namics can be represented by a function, known as the Hamiltonian function, of model parameters 
6 and fictitious momentum parameters p ~ N(0, M) (with the same dimension as 6) as follows: 

H(0,p) = -logp{0) + ^p T M- l p (1) 

where M is a symmetric, positive-definite mass matrix. 

Hamilton's equations, which involve differential equations of H, determine how and p change 
over time. In practice, however, solving these equations exactly is too hard, so we need to ap- 
proximate them by discretizing time, using some small step size e. For this purpose, the leapfrog 
method is commonly used. 
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Figure 1: The first 10 iterations in sampling from a banana shaped distribution with random 
walk Metropolis (RWM), Hamiltonian Monte Carlo (HMC), and Riemannian Manifold HMC 
(RMHMC). For all three methods, the trajectory length (i.e., step size times number of integration 
steps) is set to 1. Solid red lines are the sampling path, and black circles are the accepted proposals. 



As the dimension grows, the system becomes increasingly restricted by its smallest eigen- 
direction, requiring smaller step sizes to maintain the stability of numerical discretization. |Giro-| 



lami and Calderhead (2011 ) proposed a new method, called Riemannian Manifold HMC (RMHMC), 



that exploits the geometric properties of the parameter space to improve the efficiency of stan- 
dard HMC. Simulating from the resulting dynamic, however, is computationally intensive since it 
involves solving two implicit equations, which require additional iterative numerical analysis (e.g., 
fixed-point iteration). 

To increase RMHMC's speed, we propose a new integrator that is completely explicit: we 
propose to replace momentum with velocity in Riemannian Manifold Hamilton dynamics. As we 
will see, this is equivalent to using Lagrangian dynamics as opposed to Hamiltonian dynamics. By 
doing so, we eliminate one of the implicit steps in RMHMC. Next, we construct a time symmetric 
integrator to remove the remaining implicit step in RHHMC. This leads to a sampling scheme, 
called e- RMHMC, that involves explicit equations only. 

In what follows, we start with a brief review of RMHMC and its geometric integrator. Section 
3 introduces our proposed semi-explicit integrator based on defining Hamiltonian dynamics in 
terms of velocity as opposed to momentum. Next, in Section 4, we eliminate the remaining 
implicit equation and propose a fully explicit integrator. In Section 5, we use simulated and real 
data to evaluate our methods' performance. Finally, in Section 6, we discuss some possible future 
research directions. 
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2 Riemannian Manifold Hamiltonian Monte Carlo 



As discussed above, although HMC explores the parameter space more efficiently than random 
walk Metropolis does, it does not fully exploits the geometric properties of parameter space de- 
fined by the density p(0). Indeed, Girolami and Calderhead (2011) argue that dynamics over 
Euclidean space may not be appropriate to guide the exploration of parameter space. To address 
this issue, they propose a new method, called Riemannian Manifold HMC (RMHMC), that ex- 
ploits the Riemannian geometry of the parameter space (Amari and Nagaoka 2006[ ) to improve 
standard HMC's efficiency by automatically adapting to the local structure. They do this by 
using a position-specific mass matrix M = G(0). More specifically, they set G(0) to the Fisher 
information matrix. As a result, p = G{0)0 ~ A/"(0, G(0)), and Hamiltonian is defined as follows: 



H(0,p) 



logp(0) + ^logdetG(0) + ip T G(e)" 1 p = 0(0) + ^ P T G(0)- 1 p (2) 



where 0(0) := — logp(0) + ^logdetG(0). Based on this dynamic, Girolami and Calderhead 
(2011) propose the following HMC on Riemmanian manifold: 

V p tf(0,p) = 







G(0 



,-i 



P 



V if(0,p) 



-v 0(0) + M0, P ) 



(3) 



Using the shorthand notation di 
j/(0,p) is 



d/d0i for partial derivative, the ith element of the vector 



(i/(0,p)), = -p T 9 l (G(0)- 1 )p = (G(0)- 1 p) T 5 l G(0)G(0)- 1 p 

The above dynamic is non-separable (it contains products of and p), and the resulting map 
(0, p) — > (0*, p*) based on the standard leapfrog method is neither time-reversible nor symplectic. 



Therefore, the standard leapfrog algorithm cannot be used for the above dynamic (Girolami and 



Calderhead, 2011). Instead, we can use the Stomer-Verlet (]Verlet[ 1967p method as follows: 

(n+l/2) _ In) 



P 



g(n+l) 



e 

2 

e 



V 0(0 (n) ) - ^(0 (n) ,p (n+1/2) ) 



+ 1 ^G^ 1 (0 (n) ) + G- 1 (0 (n+1) ; 



p 



(n+l/2) 



(n+l/2) 



V 0(0 ( " +1) )--^(0 (n+1) ,p (n+1/2) 



(4) 
(5) 
(6) 



This is also known as generalized leapfrog (Leimkuhler and Reich, 2004). The above series of trans- 
formations are (i) deterministic (ii) reversible and (iii) volume-preserving. Therefore, the effective 
proposal distribution is a delta function 5((0^\ p^), {0^ L \ p^) and the acceptance probability 
is as follows: 

exp(-H(9W,pW)) i((» ( ",p< 1 )),(9 <I) ,p( I >)) 



Here, (0 , p^) is the current state, and (0 < - L \p^ L ^) is the proposal after L leapfrog steps. 
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As an illustrative example, Figure [T] shows the sampling paths of random walk Metropolis 



(RWM), HMC, and RMHMC for an artificially created banana-shaped distribution (See Girolami 



and Calderhead, 2011 , discussion by Luke Bornn and Julien Cornebise). For this example, we fixed 



the trajectory and chose the step sizes such that the acceptance probability for all three methods 
remains around 0.7. RWMH moves slowly and spends most of iterations at the distribution's 
low-density tail, and HMC explores the parameter space in a tortuous way, while RMHMC moves 
directly to the high-density region and explores the distribution more efficiently. 

One major drawback of this geometric integrator, which is both time-reversible and volume- 
preserving, is that it involves two implicit functions: Equations Q and (j5|. These functions 
require extra numerical analysis (e.g. fixed-point iteration), which results in higher computational 
cost and simulation error. To address this problem, we propose an alternative approach that uses 
velocity instead of momentum. 



3 Moving from Momentum to Velocity 

In the Hamiltonian dynamic (|3]), the product of G(0) _1 and p is in fact velocity, v = G(0) x p. 
This motivates us to define the dynamic in terms of v instead of p. The transformation p i— > v 
changes the Hamiltonian dynamics ^ to the following form (derivation in Appendix [A| : 

= v 

(8) 

v = -r 7 (0,v)-G(0)- 1 V </)(0) 

where 77(0, v) is a vector whose kth element is , r^-(0)vV. Here, 1^(0) := §X/jS fcZ (^»Sy + 
djgu — digij) is Christoffel symbol whose (i, j)th element is G(0) = (gy)- Further, G(0) _1 = (g y '). 

This transformation moves the Hamiltonian dynamic's complexity from its first equation for 
to its second equation. In this way, we resolve one implicit function in the generalized leapfrog 
method and develop a semi-explicit integrator as follows: 

v (n+l/2) = v (")_£[( v (n+V2))T r ^(n)^ v („+l/2) +G ^(n))-l V ^^(n))] (g) 
0(n+l) = !r.) + £V (n+l/2) (1Q) 



,(*»+!) 



v (n+l/2) _ £ [(v (n+l/2) )Tr(6) (n + l) )v (n + l/2) + Q^+Dyl Ve 0(0("+ 1 ))] (H) 



2 



Note that updating v remains implicit (more details are available in Appendix |B[). 

In general, the new dynamic ^ cannot be recognized as a Hamiltonian dynamic of (0, v). Nev- 
ertheless, it remains a valid proposal-generating mechanism, which preserves the original Hamilto- 
nian H(6, p = G(0)v) (proof in Appendix|A]); thus, the acceptance probability is only determined 
by a discretization error from the numerical integration, as before. 

Because p ~ JV(0, G(0)), the distribution of v = G(0)- x p is Af(0, G(0)- 1 ). Therefore, we 
have 

p(v) = —= / exp(--v T G(0)vl oc (detG(0)) 1/2 exp(--v T G(0)vi 
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Algorithm 1 Semi-explicit Riemannian Manifold Lagrangian Monte Carlo (RMLMC) 

Initialize 0^ = current 

Sample new velocity v« ~ Af(0, G _1 (0 (1) )) 

Calculate current E(0^, vt 1 )) according to equation (23) 

for n — 1 to L (leapfrog steps) do 

% Update the velocity with fixed point iterations 

^(0) = y (n) 

for % — 1 to NumOfFixedPoint Steps do 

v » = v (n) _ |G(6» (n) )- 1 [(v( i - 1 )) T f(0 (n) )v( i - 1 ) + V 0(0 (n) )] 
end for 

y (n+l/2) _ ^.(Zast i) 

% Update the position only with simple one step 

0(n+l) _ g(n) _j_ £v (n+l/2) 

Alogdet n = logdet(I-£(v( n+1 / 2 )) T r(6/ (n+1) )) -logdet(I + e(v(" +1 / 2 )) T r(6> (n) )) 
Update the velocity exactly 

y (n+l) = y (n+l/2) _ |Q(0(n+l))-l[( v (n+l/2))Tp^(n+l)^ v („+l/2) + V 0(0 (n+1) )] 

end for 

Calculate proposed E(0^ L+1 \ v( i+1 )) according to equation (23) 
logRatio = — ProposedH + CurrentH + ^2n=i Alogdet n 
Accept or reject according to Metropolis ratio 



We define the energy function E(0, v) as the sum of the potential energy, U(0) and K(0, v), where 
tf(0,v) = -log(p(v)). 

Analogous to RMHMC, thebacceptance probability is calculated based on E(0,v), which is 
the negative log of the joint density of parameter and the new auxiliary variable v. Therefore, 
we can apply the generalized leapfrog scheme to this new dynamic (j8l) and derive a semi-implicit 
method. (See more details in Appendix|B}) Although the resulting integrator is not symplectic, we 
nonetheless have detailed balance with volume correction. Algorithm [T] shows the corresponding 
steps for implementing this method. 

In Appendix [Cj we show that the new dynamic p]) is essentially a Lagrangian dynamic. There- 
fore, we refer to the derived algorithm (Algorithm nj) as Riemannian Manifold Lagrangian Monte 
Carlo (RMLMC), which explores the parameter space along the path on a Riemannian mani- 
fold that minimizes the total Lagrangian. We can use this new proposal-generating mechanism, 
RMLMC, which is based on an Euler-Lagrange system (pi) of (0, v) instead of the original Hamil- 
tonian system ^ defined in terms of (0, p). The two methods use equivalent dynamics but differ 
numerically in the following way: RMHMC augments parameter space with momentum, while 
RMLMC augments parameter space with velocity. Later, we will show that switching to velocity 
leads to substantial improvement in computational efficiency 
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Algorithm 2 Explicit Riemannian Manifold Lagrangian Monte Carlo (e-RMLMC) 

Initialize 0^ = current 6 

Sample new velocity v« ~ Af(0, G(6/ (1) )- 1 ) 

Calculate current E(0^\\^) according to equation (23) 

A log det = 

for n = 1 to L do 

A log det = Alogdet-det(G(0 (n) ) + e/2f2(0 (n) , v^)) 

Update the velocity explicitly with a half step: 

v( ri + 1 / 2 ) = [G(6/ (n) ) + |^(6/ (n) ,v(™))]- 1 [G(6/ (n) )v (n) -|V 0(0 {rt) )] 

A log det = Alogdet + det(G(6> {n) ) -£/20(6l (n) ,v( n+1 / 2 ))) 

Update the position with a full step: 
0(»+i) _ _j_ sv (n+|) 

A log det = Alogdet-det(G(0 (n+1) ) +£/2S7(0 (n+1) , v (™ +1 / 2 ))) 
Update the velocity explicitly with a half step: 

V (n+1) = [ G ( (n+1)) + |f2(0 (n+1) , v (n+l/2))]-l [Q(6> (n+1 )) V ( n+1 / 2 ) - | V e 0(6» (n+1) )] 

A log det = Alogdet + det(G(0 (n+1) ) -£/2S7(0 (n+1) , v ( n+1 ))) 
end for 

Calculate proposed E(6^ L+1 \ v^ L+1 )) according to equation (23) 
logRatio = — ProposedE + CurrentE + A log det 
Accept or reject the proposed state according to (15) 



4 Explicit Riemannian Manifold Lagrangian Monte Carlo 

We now propose a fully explicit integrator for Lagrangian dynamics p| as follows: 



v (n+i/2) = p + £n(0W jV («))]-i[ v (") _ |G(6> (n) )- 1 V 0(6» (n) )] (12) 

0(»+l) = 0(n) +£V (n+l/2) (13) 

V (n+D = [i+£o(6/ (n+1 \v(" + ^)]- 1 [v( n+1 / 2 )-|G(0 (n+1) )- 1 V e 0(0 (n+1) )] (14) 

where f2(0 ( - n ' ) , v^ 71 ^) is a matrix whose (z,j)th element is J^ fc vj^ r-L(0^). This integrator is (i) 
reversible and (ii) energy-preserving up to order 0(e), where e is the stepsize. The resulting 
map, however, is not volume-preserving and as such the effective proposal distribution will be the 
product of a delta function and the determinant of the transformation, 

exp(-E(6> (L) ,y( L ))) 5((0 {L \ yW), (6> (1) , v^)) 
exp(-E(0 (1 ,v( 1 ))) X 5((0 (1) ,v( 1 )),(0 (L) ,v( i ))) X 

which simplifies to 

exp(E(0 (1) , v (1) ) - E(0 (L) , v (L) )) x det J (15) 
with J the Jacobian matrix of (0 , v^) — >■ (0 ( - L - ) , v^ L ^). Detailed derivations and proofs are given 



in Appendix D 
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We refer to this approach as explicit Riemannian Manifold Lagrangian Monte Carlo (e-RMLMC). 
Algorithm^ shows the corresponding steps for this method. In this algorithm, we use £l(0,v) to 
denote G(0)fl(0, v) whose (fc,j)th element is equal to ^ vT*-(0). 

Our proposed e-RMLMC does not involve implicit functions for updating (0, v) in RMHMC. 
Because we remove multiple fixed-point iteration steps, we reduce the computation time by 0(D 2 ) 
where D is the dimension of the parameters. Additionally, using this explicit updating, we resolve 
the convergence issue faced by fixed-point iterations. The connection terms T(0) in f2 do not 
add substantial computational cost since they are obtained from permuting three dimensions of 
the array dG(0), which is computed in RMHMC. However, besides G(#) -1 , which is required 
for V 0(f)(0), e-RMLMC has two extra matrix inversions to update v, whose complexity in general 
is 0(D 3 ). Therefore, as dimension grows, the efficiency gained by removing multiple fixed-point 
iterations may be overwhelmed by this additional overhead. This is evident from our experimental 
results presented in Section |5j Faster matrix inversion algorithms could be used to alleviate this 
issue. 



5 Experimental Results 



In this section, we use simulated and real data to evaluate our methods, RMLMC and e-RMLMC, 
compared to RMHMC. Following Girolami and Calderhead (2011), we use a time-normalized 



effective sample size (ESS) to compare these methods. For B posterior samples we calculate ESS 
= B[l + 2Zfc7(fc)] _1 for each parameter and choose the minimum as the measure of sampling 
efficiency, where Ekl{k) is the sum of the K monotone sample autocorrelations estimated by the 



initial monotone sequence estimator (Geyer, 1992). All computer programs and data sets discussed 



in this paper are available online at http://www.ics.uci.edu/~babaks/Site/Codes.html 



5.1 Logistic Regression Models 



We start by evaluating our methods based on five binary classification problems used in Girolami 
and Calderhead (2011). These are Australian Credit data, German Credit data, Heart data, Pima 



Indian data, and Ripley data. For each problem, we use a logistic regression model and run 20000 
MCMC iterations. Results (after discarding the initial 5000 iterations) are summarized in Table 
[TJ and show that in general our methods improve the sampling efficiency measured in terms of 
ESS per second compared to RMHMC. 



5.2 Simulated Logistic Regression 

Next, we construct some synthetic datasets with variable numbers of observations and increasing 
dimensionality for logistic regression. The simulated results are summarized in Table [2] In general, 
our methods RMLMC and e-RMLMC improve RMHMC in minimal ESS per second, but as 
expected, such an advantage gradually diminishes as the dimension increases. 
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Data 


method 


AP 


s 


ESS 


min(ESS)/s 






RMHMC 


0.74 


2.40E-02 


(8561,9595,10262) 


23.77 


Australian 


RMLMC 


0.75 


1.90E-02 


(8038,10488,11468) 


28.24 


D=14,N 


=690 


e-RMLMC 


0.75 


1.55E-02 


(9636,10443,11268) 


41.34 






RMHMC 


0.77 


5.63E-02 


(15000,15000,15000) 


17.76 


German 




RMLMC 


0.73 


4.32E-02 


(15000,15000,15000) 


23.17 


D=24,N 


=1000 


e-RMLMC 


0.70 


3.59E-02 


(13762,15000,15000) 


25.57 






RMHMC 


0.77 


1.65E-02 


(7050,8369,8905) 


28.57 


Heart 




RMLMC 


0.78 


1.09E-02 


(10847,11704,12405) 

\ 7 7 / 


66.25 


D=13,N 


=270 


e-RMLMC 


0.76 


1.01E-02 


(10347,10724,11773) 

\ 7 7 / 


68.18 






RMHMC 


0.81 


1.25E-02 


(4325,4622,4980) 


23.10 


Pima 




RMLMC 


0.82 


7.27E-03 


(4713,5448,5576) 


43.20 


D=7,N= 


532 


e-RMLMC 


0.82 


7.04E-03 


(4839,5193,5539) 


45.85 






RMHMC 


0.78 


8.39E-03 


(15000,15000,15000) 


119.20 


Ripley 




RMLMC 


0.76 


5.07E-03 


(13498,15000,15000) 


177.43 


D=2,N= 


250 


e-RMLMC 


0.79 


4.77E-03 


(12611,15000,15000) 


176.37 



Table 1: Comparing alternative methods using five binary classification problems discussed in 



Girolami and Calderhead (2011). For each dataset, the number of predictors, D, and the number 
of observations, N, are specified. For each method, we provide the acceptance probability (AP), 
the CPU time (s) for each iteration, and the time-normalized ESS. 



Data 


Method 


AP 


s 




ESS 




min(ESS)/s 






RMHMC 


0. 


34 


1.85e+02 


(4837, 


4902, 


4968) 


26.20 


N=200,D= 


10 


RMLMC 


0. 


86 


4.44e+01 


(5000, 


5000, 


5000) 


112.60 






e-RMLMC 


0. 


33 


7.42e+01 


(3792, 


4310, 


4671) 


51.11 






RMHMC 


0. 


32 


9.56e+02 


(4727, 


4893, 


5000) 


4.95 


N=400,D= 


20 


RMLMC 


0. 


80 


1.28e+02 


(4680, 


4819, 


4857) 


36.44 






e-RMLMC 


0. 


31 


2.59e+02 


(2964, 


3543, 


3968) 


11.46 






RMHMC 


0. 


32 


3.15e+03 


(4691, 


4983, 


5000) 


1.49 


N=800,D= 


40 


RMLMC 


0. 


82 


6.88e+02 


(4749, 


4836, 


4960) 


6.91 






e-RMLMC 


0. 


31 


1.09e+03 


(2902, 


3636, 


4127) 


2.65 






RMHMC 


0. 


31 


9.87e+03 


(3712, 


4515, 


4950) 


0.38 


N=1600,D 


=80 


e-RMLMC 


0. 


83 


4.64e+03 


(4002, 


4672, 


4919) 


0.86 






e-RMLMC 


0. 


30 


1.19e+04 


(2565, 


3415, 


4081) 


0.22 






RMLMC 


0.79 


1.63e+05 


(3160, 


3959, 


4464) 


0.02 


N=3200,D 


=160 


RMLMC 


0. 


83 


1.44e+05 


(3458, 


4221, 


4676) 


0.02 






e-RMLMC 


0. 


30 


1.20e+05 


(2708, 


3548, 


4156) 


0.02 



Table 2: Time, ESS and time-normalized ESS for logistic regression with simulated datasets. 
Results are calculated on a 5,000 sample chain with a 5,000 sample burn-in session. 



S 



Sampling Path of RMHMC Sampling Path of RMLMC Sampling Path of e-RMLMC 




-2-1012 -2-1012 -2-1012 



e 1 e 1 e 1 

Figure 2: The first 10 iterations in sampling from the banana-shaped distribution with Riemannian 
Manifold HMC (RMHMC), Riemannian Manifold Lagrange Monte Carlo (RMLMC) and explicit 
RMLMC (e-RMLMC). For all three methods, the trajectory length (i.e., step size times number of 
integration steps) is set to 1.45. Solid red lines show the sampling path, and each point represents 
an accepted proposal. 



5.3 Simulating a banana-shaped distribution 

The banana-shaped distribution, which we used above for illustration, can be constructed as the 
posterior distribution of 9 = (9±, 6*2) 1 2/ based on the following model: 

y\6 ~ N{6 l + 6lal) 
6 rsj N(0,a 2 e ) 

The data {yi}}^[ are generated with 6\ + Q\ = 1, a y = 2. We set erg = 1. 

We want to investigate how the three algorithms, RMHMC, RMLMC, e-RMLMC, explore 
the parameter space. Fig|2] shows the first 10 iterations for each algorithm using fixed trajectory 
length of 1.45. We can see that RMLMC and e-RMLMC explore the parameter space according 
to its curvature and mix quickly, similar to RMHMC. 

Table [3] compares the performances of these algorithms based on 5000 MCMC iterations (after 
burning the initial 1000 iteration). For this example, RMLMC has the highest ESS/s. As discussed 
above, the additional overhead of matrix inversion for e-RMLMC occasionally overwhelms its gain 
in computational efficiency 

5.4 Finite Mixture of Gaussians 

Finally we consider finite mixtures of univariate Gaussian components of the form 

K 

p(Xj\0) = ^TCkNjXilflk,^) (16) 

fc=i 

9 



Method 


AP 


s 


ESS min(ESS)/s 


RMHMC 


0.73 


8.21e-03 


(729,1117,1506) 17.76 


RMLMC 


0.79 


5.36e-03 


(857,1317,1777) 31.99 


e-RMLMC 


0.78 


5.61e-03 


(585,1085,1585) 20.85 



Table 3: Comparing alternative methods using a banana-shaped distribution. For each method, 
we provide the acceptance probability (AP), the CPU time (s) for each iteration, and the time- 
normalized ESS. 








Figure 3: Densities used to generate synthetic datasets. From left to right the densities are in the 
same order as in Table |4| The densities are taken from McLachlan and Peel (2000) 



where is the vector of size D = 3K of all the parameters 7T k , [i k and a\ and J\f(-\fi, a 2 ) is a 
Gaussian density with mean \x and variance a 2 . A common choice of prior takes the form 



K 



p(0) = P(tti, . . . ,ttk|A) Y[Af(fj, k \m, p- l a 2 k )Xg{a 2 k \b, 



c) 



(17) 



k=l 



where T>(-\\) is the symmetric Dirichlet distribution with parameter A and IQ(-\b, c) is the inverse 
Gamma distribution with shape parameter b and scale parameter c. 

Although the posterior distribution associated with this model is formally explicit, it is com- 
putationally intractable, since it can be expressed as a sum of K N terms corresponding to all 
possible allocations of observations x\ to mixture components (Marin et al 2005, chap. 9). We 



want to use this model to test the efficiency of posterior sampling using the three methods. 
A more extensive comparison of Riemannian Manifold MCMC and HMC, Gibbs sampling and 



standard Metropolis-Hastings for finite Gaussian mixture models can be found at |Stathopoulos 
and Girolami (2011). Due to the non-analytic nature of the expected Fisher Information, 1(0), 



we use the empirical Fisher information as metric tensor, defined in (McLachlan and Peel 
chap. 2): 

G{0) = S T S - ±ss T 



2000, 



where N x D score matrix S has elements Si 



dlogp(xi\0) 



and 



Depending on allocations of 7r, we show several classical mixtures showing in the following 
Table [4] and Figure [3] Their sampling efficiency is compared in Table [5] As before, our two 
algorithms outperform RMHMC. 
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Dataset Density function Num. of 

name parameters 

Kurtotic lM(x\0, 1) + ±Af (x\0, (^) 2 ) 6 

Bimodal \M (x\ - 1, (§) 2 ) + \M (x\l, (§) 2 ) 6 

Skewed |AA(x|0,l) + iAA(x||,(i) 2 ) 6 

Tnmodal ^ (x| - §, (f ) 2 ) + ^ (z|§, (f ) 2 ) + ^ (*|0, (|) 2 ) 9 

Claw \N{x\Q, 1) + Eto 15^ - 1, (±) 2 ) 18 



Table 4: Densities used for the generation of synthetic Mixture of Gaussian data sets. 



Data 


Method 


AP 


s 




ESS 




min(ESS)/s 




RMHMC 


0. 


30 


2.54e+03 


(1524, 


3474, 


4586) 


0.60 


claw 


RMLMC 


0. 


86 


1.88e+03 


(2531, 


4332, 


5000) 


1.35 




e-RMLMC 


0. 


32 


1.46e+03 


(2436, 


3455, 


4608) 


1.67 




RMHMC 


0.79 


4.97e+02 


(4701, 


4928, 


5000) 


9.46 


trimodal 


RMLMC 


0. 


82 


2.01e+02 


(4978, 


5000, 


5000) 


24.77 




e-RMLMC 


0. 


30 


2.42e+02 


(4899, 


4982, 


5000) 


20.21 




RMHMC 


0. 


35 


2.55e+02 


(5000, 


5000, 


5000) 


19.63 


skewed 


RMLMC 


0. 


82 


1.13e+02 


(4698, 


4940, 


5000) 


41.68 




e-RMLMC 


0. 


34 


1.26e+02 


(4935, 


5000, 


5000) 


39.09 




RMHMC 


0. 


32 


2.36e+02 


(5000, 


5000, 


5000) 


21.20 


kurtotic 


RMLMC 


0. 


85 


1.27e+02 


(5000, 


5000, 


5000) 


39.34 




e-RMLMC 


0. 


31 


1.35e+02 


(5000, 


5000, 


5000) 


36.90 




RMHMC 


0. 


36 


2.69e+02 


(5000, 


5000, 


5000) 


18.56 


bimodal 


RMLMC 


0. 


81 


1.03e+02 


(4935, 


4996, 


5000) 


48.00 




e-RMLMC 


0. 


35 


1.08e+02 


(5000, 


5000, 


5000) 


46.43 



Table 5: Time, ESS and time-normalized ESS for Gaussian mixture models. Results are calculated 
on a 5,000 sample chain with a 5,000 sample burn-in session. 



6 Conclusions and Discussion 



Following the method of Girolami and Calderhead (2011) for more efficient exploration of pa- 



rameter space, we have proposed new sampling schemes to reduce the computational cost associ- 
ated with using a position-specific mass matrix. To this end, we have developed a semi-explicit 
(RMLMC) integrator and a fully explicit (e-RMLMC) integrator for RMHMC and demonstrated 
their advantage in improving computational efficiency over the generalized leapfrog (RMHMC) 
method used by Girolami and Calderhead (2011). It is easy to show that for G(6) = I, our 



method degenerates to standard HMC. 



Future directions could involve splitting Hamiltonian (Dullweber et al. , 1987 Sexton and Wein- 
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garten, 1992 Neal, 2010 Shahbaba et al. 2011) to develop explicit geometric integrators. For 



example, one could split a non-separable Hamiltonian dynamics into several smaller dynamics 
some of which can be analytically solved. A similar idea has been explored by (Chin, 2009), where 
the Hamiltonian, instead of the dynamic, is split. 

Another possible research direction could be to approximate the mass matrix (Christofell Sym- 
bols). For many large-dimensional problems, the mass matrix could be appropriately approximated 
by a highly sparse matrix. This could further improve our method's computational efficiency 
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Appendix: Derivations and Proofs 



In what follows, we show the detailed derivations of our methods. We adopt Einstein notation 
(summation convention), so whenever the index appears twice in a mathematical expression, 
we sum over it: e.g., a^o 1 := X^ a ^' ^%v l v^ := £V - r^V ' . A lower index is used for the 
covariant tensor, whose components vary by the same transformation as the change of basise.g., 
gradientwhereas the upper index is reserved for the contravariant tensor, whose components vary 
in the opposite way as the change of basis in order to compensate: e.g. velocity vector. Interested 



readers should refer to Bishop and Goldberg (1980). 



A Transformation of Hamiltonian Dynamics 

To derive the dynamic (JsJ) from the Hamiltonian dynamic (|3]), the first equation in ^ is directly 
obtained from the assumed transformation: = g kl pi = v fc . For the second equation in ([8]), we 
have 

rf(g /j (^)v J ) Oglj-i ■ .jr. i j . -j 

dt dOi 
Further, from Equation ^ we have 

Pl = -d l( />(0) + ^v T d,G(0)v = -d^ + igtf,,vV 



which means 



By multiplying G 1 = (g kl ) on both sides, we have 

v fe = 6 k V = -g w (d ig y - i$gy)vV - g kl d l( f> (18) 
Since i,j are symmetric in the first summand, switching them gives the following equations: 

v fc = -g H (d,g H - U gii )vV - g kl d l( p (19) 



which in turn gives the final form of Equation (|8j) after adding equations ( 18 ) and ( 19 ) and dividing 
the results by two: 

w k = -r£.(0)vV - g kl (0)d l( j)(e) 

Here, T k A0) := \g kl (digij + djgu — digij) is Christoffel Symbol of second kind. 
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Note that the new dynamic Q still preserves the original Hamiltonian H(6, p = G(0)v). This 
is of course intuitive, but it also can be proven as follows: 

±H(0, G(0)v) = T ^H(6, G(0)v) + y T -^H(e, G(0)v) 



V e <P(0) + VdG(0)v 



+ [-v T r(6>)v-G(6/)- 1 V 0(6/)] T G(6/)v 



= v T V 0(6>) - (V 0(6/)) T v + V (v T 9G(6>)v) - (v T f(0)v) T v 
=0+0=0 

where v T r(#)v is a vector whose kth element is rf-(0)v l v J '. The second is due to the triple 
form (v T r(#)v) T v = ryjfeV 1 v J 'v fc = |9fcgjjV*v J 'v , where T is Christoffel Symbol of first kind with 
elements Ti jk (0) := g M r^(0) = §(<9ig fc j + djg ik - d k gij). 



B Derivation of semi-explicit Riemannian Manifold La- 
grangian Monte Carlo (RMLMC) 

Consider the following generalized leapfrog integration scheme: 

edH 



,(n+l/2) 
g(n+l) 

r>+l) 



p (n)_ 

(n) + 



2 96> 



(6>^,p( n+1 / 2 )) 



9 P 1 ' p )+ dp { ' p 



p 



(n+l/2) 



0(n) jp (»+l/2^ 



We composite implicit steps for velocity and explicit step for position within a leapfrog step to 
integrate dynamic (pj) and derive the following semi-explicit integrator: 



,(n+l/2) 
^(n+l) 



;[( v (n+l/2))T r ^(n)) v («+l/2) + G(6/ (n) )~ 1 V 0(6> 



_|_ £V ("+V 2 ) 



(20) 
(21) 



.(n+l/2) 



■ [( v (" +1 / 2 )) T r(6/ (n+1) )v (n+1/2) + G(6/ (n+1) )~ 1 V 0(6/ (n+1) )] (22) 



The time-reversibility of this integrator can be shown by switching (#,v)( n+1 ) and (6,vY n ' and 
negating velocity. The resulting integrator, however, is no longer volume-preserving (see subsection 
B.l). Nevertheless, based on proposition [I] we can still have detailed balance after determinant 



adjustment. (See Liu |2001 , for more details. 

Proposition 1 (Detailed Balance Condition with determinant adjustment). Denote z = (9,p), 
z' = Tl(z) for some time reversible integrator Tl to the Lagrangian dynamic. If the acceptance 
probability is adjusted in the following way: 



a z, z 



min < 1 



exp(-F(z')) 
exp(-tf(z)) 
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detTj 



then the detailed balance condition still holds 



a(z,z')P(dz) = d(z',z)P(dz') 



Proof. 



a(z,z')P(dz) = min 1 1, 



exp(-#(z')) 
exp(-tf(z)) 



dz' 



dz 



exp(— if (z))dz 



min < exp(— if (z)), exp(— if (z')) 



dz' 




dz 


dz 


} 


dz' 



dz' 



= min < 1 



exp(-if(z)) 
exp(-ff(z')) 



dz 



dz' 



exp(-ff(z'))z' = a(z',z)P(dz') 



□ 



□ 



Therefore, the acceptance probability could be calculated based on if (0, G(0)v). However, it 
also could be also calculated as follows based on the energy function E(0, v) defined in section |3j 

E(0,v) = U(0) + K(0,v) = -\ogp(0) - ^logdetG(0) + Vg(0)v 



To show their equivalence, we note that 
as follows: 



9(0', P ') 



0(6, p) 



det(G(0')) 
det(G(0)) 



9(0', v') 



9(0, v) 



(23) 

and proved our claim 



a 



min < 1 



exp(-ff(0',p')) 
exp(-ff(0,p)) 



9(0, P ) 



min < 1 



exp(-ff(0',G(0')v')) 



mm 



mm 



mm 



exp(-ff(0,G(0)v)) 
exp{-(logp(0') + ^logdetG(0') + |v' T G(0')v')} det(G(0')) 
' exp{-(logp(0) + |logdetG(0) + |v T G(0)v)} det(G(0)) 

exp{-(logp(0') - ^logdetG(0') + ^v' T G(0')v')} 8(0', v') 
' exp{-(logp(0)-|logdetG(0) + |v T G(0)v)} 

exp(-E(0 / ,v')) 9(0', v') 
' exp(-E(0,v)) 



9(0', P ') 



9(0, p) 
9(0', v') 



9(0, v) 



9(0, v) 



9(0, v) 



B.l Volume Correction 

To adjust volume, we must derive the Jacobian determinant, det J : = 
products. 



a(e(i+i) jV (i+i)) 



using wedge 



on 



Definition 1 (Differential Forms, Wedge Product). Differential one-form a : TM — > 
a differential manifold M D is a smooth mapping from tangent space TM D to K, which can be 
expressed as linear combination of differentials of local coordinates: a = fidx 1 =: / • dx. 



For example, if f : 



is a smooth function, then its directional derivative along a vector 



v G M. D , denoted by df(v) is given by 
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then df(-) is a linear functional of v, called the differential of f at z and is an example of a 
differential one-form. In particular, dz l {v) = v l , thus 

df df 
df(v) = -^—dz l (v), then df = ^—dz l 

OZi OZi 

Wedge Product of two one-form a, (3 is a 2-form a A (3 anti- symmetric bilinear function on 
tangent space which has the following properties (a, /3, 7 one-forms, A be a square matrix of same 
dimension D): 

• a A a = 

• aA(/3 + 7) = aA/3 + aA7 (thus a A /3 = -/3 A a) 

• a A A(3 = A T a A (3 

The following proposition enables us to calculate the Jacobian determinant denoted as det J. 
Proposition 2. Let T L : (0W,vOQ) _ >. (Q( L+1 \ v ( l+1 )) be evolution of a smooth flow, then 

dO^ A dv( L+1 > = d{ ° ' V LdBM A dv« 

d(0 {1 \vW) 

Note that the Jacobian determinant det J can also be regarded as Radon-Nikodym derivative 

P(d0 (L+1) ,dv (L+1) ) 

of two probability measures: det J = 7-^ , where F(dO,dv) = p(0,v)d0dv. We 

have 

dv (n+l/2) = dv (n) _ e(v (n+l/2) ) T r(0 (n) )rfv (n + l/2) + („) rf6 ,W 

d0 (n+1) = d0 {n) + ed^ n+1 ^ 

d^ n+1 ) = d^ n+1 ^ - e(^ n+1 ^) T T(0^)d^ n+1 ^ + (**)d0 {n+1) 
where v T T(0) is a matrix whose (k,j)th element is vT^(0). Therefore, 
dO^ A dv(" +1 ) = [I - e(v^ +1 ^) T T(0^)] T dO^ A d^ n+1 ^ 

= [i _ £ (v(" +1 / 2 )) T r(6>( n+1 ))] T d0W a c/ v (" +1 / 2 ) 

= [I _ £ ( v ( m+1 / 2 )) T r(0( n+1 ))] T [I + e(^ n+1 ^) T T(0^)]- T d0^ A rfvW 

For volume adjustment, we must use the following Jacobian determinant accumulated along leap 
frog steps: 

"4 det(/ + £(v(" +1 /2))T r (6»( n ))) 



det J 



LMC 



d{0 



d(0 (1) ,v«) 



As a result, the acceptance probability becomes 

a LMC = min{l, exp(-E(0( L+1 ), v (L+1) ) + E(0« v«))| det J iMC |} 

Using this acceptance probability, we are able to derive a semi-explicit integrator for RMLMC as 
shown in Algorithm [TJ In this approach, the updates for are explicit, while updating v remains 
implicit. 
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C Connection to Lagrangian Dynamics 

We now show that the above dynamic (j8J) is indeed Lagrangian dynamic. We define Lagrangian 
as follows: 

L = Vg(0)v - 0(0) 

Using variation calculus to minimize the Lagrangian, we obtain a Euler-Lagrange equation of the 
second kind, 

dL _ ddL 
dd ~ dt~d§ 

which is 

= -0 T T(O)v - G(6)- l W e <P(0) (25) 

This is equivalent to the new dynamic (j8j by taking the time derivative on the first equation and 
equating it with the time derivative on the second. 



D Derivation of explicit Riemannian Manifold Lagrangian 
Monte Carlo (e-RMLMC) 

We now propose an additional modification of Algorithm [T] to resolve the remaining implicit 
equation (20), while keeping time-reversibility to ensure the ergodicity of the induced Markov 



chain. We do this by modifying the symmetric quadratic form in equations (20) 

„(n+V2) _ 



» 



: [(v (n) ) T r(0 (n) )v (n+1/2) + G(0 (n) )- 1 V o 0(0 (n) )] 



#0+1) 



(n) + ev 

..(n+1/2) _ 



(n+1/2) 



e 



[(v (n+1/2) ) T r(0 (n+1) )v (n+1) + G(0 (n+1) ) _1 V e 0(0 



(n+1^ 



(26) 

(27) 
(28) 



The resulting integrator is completely explicit since both updates of velocity (26) and (28) can be 
solved as follows: 



.(n+1/2) 



>+i) 



[I + |(v (n) ) T r(0 (n) )]- 1 [v (n) - |G(0 ( " ) )- 1 V«0(0 (n) )] 



(29) 



[I + £( v (n+l/2))T r( 0(n+l))]-l [v (n+l/2) _ i G (0 (n+1) )- 1 V e 0(0 (n+1) )] (30) 



D.l Convergence of Numerical Solution 



We now show that the discretization error e r 



|z(t n ) 



(0(Uv(t„))-(0 (ri) ,vM; 



(i.e. the difference between the true solution and the numerical solution) accumulated over final 



time interval [0, T], is bounded and goes to zeros as the stepsize e goes to zero. (See Leimkuhler 



and Reich (2004) for a similar proof for the generalized leapfrog method.) Here, we assume 



that F(0, v) := v T r(0)v + G(0) 1 Ve0(0) is smooth; hence, F and its derivatives are uniformly 



18 



bounded as (6, v) evolves within finite time duration T. We expand the true solution z(t n+1 ) at 

tn- 



z(t„+i) = z (tn) + z(t n )£ + ]^{t n )e 2 + o(e) 



v(t n ) 

0(t„)' 
v(t„) 



+ 
+ 



v(t„) 
-F(0(t n ),v(t n )) 

v(t n ) 

-F{0(t n ),v{t n )) 



-F{0{t n ),v(tn)) 
,10 v V."n) + 



9F rV(tn) + fF(^(tn),v(i n )) 



e z + o(eJ 



e + 0(e 2 ) 

Next, we simplify the expression of the numerical solutions z( n+1 ) 



0(n+l)' 
v (n+l) 



for the fully explicit 



integrator and compare it to the above true solutions. To this end, we rewrite equation (29) as 
follows: 

v (n+l/2) = [I + £( v (n))T r (0(«))]-l[ v W _ lQ^n)yl V 0(0 (n) )] 

= V W - [I + -( V W) T r(6» (n) )]- 1 [(vW) T r(6» (n) )v (n) + -G(0 (n) )- 1 V e 0(0 (n) )] 
2 2 

= vW - [I + |( v W) T r(6» (n) )]- 1 | J F(6» (n) , V W) 



» _ 



F(0 (n) ,v^) + + J(v^) T r(6» (n) )]- 1 [( v W) T r(6» (ri) )] J p(6» (n) ,vW) 



4 



V W _ iF(6/ (n) ,v^) + 0(e 2 ) 
2 



Similarly, from equation (30) we have 

y (n+l) = y (n+l/2) _ f._p^(n+l) jV (fH-l/2)) + 0(e 2 ) 

Substituting \(- n+1 / 2 ) in the above equation, we obtain v( n+1 ) as follows: 
v (n+i) = v (n) _ ^( («) jV W) _ £ -F(0 {n+l \ v (n) ) + 0(e 2 ) 



» 



F(0 (ri) ,v (n) )e + -[F(0 w ,v w ) - F(0 W + 0(e),v w ] + O(^) 



»1 



F(0 {n) y n) )e + O(e< 



From (29), (27), and (30), we have the following numerical solution: 

(n+l) _ , v , r>(^ 

Z - ^ v (n+l)J - ^ v (n)J + [-F(6> (n) , V ( n ))J £ + J 

Therefore, the local error is 



e n+l — || z (^n+l; 



0(t n ) ~ W ' 

y{t n ) - v (n) . 



+ 



v(t„) - vW 

■[F(0(g,v(g)-F(^,vW)i 



e + 0{e 2 



< (1 + Me)e n + 0(e 2 
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where M = csup t6 r 0T i \\VF(0(t), v(t))|| for some constant c > 0. Accumulating the local errors 
by iterating the above inequality for L = T/e steps provides the following global error: 

e n +i < (1 + Me)e n + 0(e 2 ) < (1 + Mefe n ^ + 20(e 2 ) < ■ ■ ■ < (1 + Me) n e x + nO(e 2 ) 
< (1 + M £ ) L £ + LO(£ 2 ) < (e MT + T)e 0, as e 



D.2 Volume Correction 



As before, using wedge product calculation on the system (29), (27), and (30), the Jacobian matrix 
is 



d{0 



(n+l) v (n+l)\ 



: [I + i(y(n+imy T ^(n+l)^-T^ _ £( v (n+l))T r ^(n+l))]T. 
[I + |( V W) T r(0^)]- T [I - |( v (-+l/2))T F(6 ,(«) )] T 



As these new equations show, our derived integrator is not symplectic so the acceptance probability 
needs to be adjusted by the following Jacobian determinant, det J, in order to preserve the detailed 
balance condition: 



det J 



e-LMC 



d{0 {L+1) M L+i y) 



n 



det(I - e/2(v(" +1 )) T r(0 (n+1) )) det(J - e/2{^ n+1 ^) T T{0 in) )) 



^ det(J + e/2(v("+ 1 /2))T r ( 6l («+i))) det g + e /2( v (™)) T r(6/ (n) )) 



(31) 



n 



det(G(0 (n+1) ) - £/2(v( n+1 )) T f(0 (n+1) )) det(G(0 (ri) ) - s/2(^ n+1 ^) T t(0 (n) )) 



* det(G(0 (n+1) ) + e/2(v("+ 1 /2))Tf (6/(™ +1 ))) det(G(0 (n) ) + e/2(vW) T f (0 (n) )) 

As a result, the acceptance probability is 

a e _ LMC = min{l,exp(-E(0( L+1 ),v( i+1 )) + E(0( 1 ),v( 1 )))|detJ e _ iMC |} 

We can now derive a completely explicit integrator for RMLMC defined in terms of (0, v). We 
refer to this integrator as e- RMLMC for which the corresponding steps are presented in Algorithm 
[2j In both algorithms [T] and [2j the position update is relatively simple while the computational 
time is dominated by choosing the "right" direction (velocity) using the geometry of parameter 
space. Finally, it is easy to show that for G(0) = I, our method degenerates to standard HMC 
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